Fluctuation-preserving coarse graining for biochemical systems 
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Finite stochastic Markov models play a major role in modeling biological systems. Such models 
are a coarse-grained description ol the underlying microscopic dynamics and can be considered 
mesoscopic. The level of coarse-graining is to a certain extend arbitrary since it depends on the 
resolution of accomodating measurements. Here, we present a systematic way to simplify such 
stochastic descriptions which preserves both the meso-micro and the meso-macro connection. The 
former is achieved by demanding locality, the latter by considering cycles on the network of states. 
Our method preserves fluctuations of observables much better than naive approaches. 

PACS numbers: 05.70.Ln, 05.40.-a, 87.18. Tt, 87.10. Mn 



In recent years non-equilibrium fluctuations have be- 
come the center interest of stochastic thermodynamics 
[TJ Rare events in situations far from equilibrium 
can now be universally described by fluctuation theorems 
[3H5] . Intensive stochastic modelling of biophysical pro- 
cesses has started in the 1960s with Hill's cycle kinetics 
[H [7] (where the focus lies on averages) and is still a very 
active field of research, though attention has shifted to 
the importance of fluctuations, c/. Ref. [5]. 

Although Hill's methods were designed for biological 
problems, they have lead to general insights in statisti- 
cal physics ^ and mathematics pHl [TT] . It was under- 
stood that in non-equilibrium situations currents driven 
by non-trivial forces which are usually called affinities. 
Assigning these affinities to cycles on the network of 
states rather then to the states themselves, they have 
a direct thermodynamic interpretation [51 |S] . This hints 
at possible redundancy in the description and already 
Hill asked how and when a network reduction would be 
possible. In statistical physics, such reduction are of- 
ten summarized under the term of coarse- graining (CG) 
methods. It was recently shown for a special CG proce- 
dure that the ability to capture fluctations depends on 
the preservation of cycle topology of the network [T^] . 

In this Letter we present a new paradigm for coarse- 
graining of stochastic dynamics which preserves the non- 
equilibrium steady-state fluctuations of physical currents. 
Though we focus on biological situations the method can 
be universally applied to any finite model of stochastic 
thermodynamics. Our method is based on two require- 
ments: (i) The preservation of the algebraic and topologi- 
cal structure of the cycles of the network and (ii) locality. 
Further, (iii) the variation of the system's entropy along 
single trajectories [3] is considered to close the equations. 
To illustrate our method we consider the molecular motor 
kinesin which is able to perform directed motion along 
intracellular filaments called microtubuli [2 [T5HT5] . It 
has two heads (active sites) where adenosin triphosphate 
{ATP) is catalytically split into adenosin diphosphate 
(ADP) and inorganic phosphate (P). During the re- 




FIG. 1. (color online) (a) The catalytic cycle at kinesin's ac- 
tive site. In a four-stage process ATP binds to the Empty 
molecule and is split into = ADP -\- P. Then first P and 
later ADP is released. Since the release of P happens im- 
mediately after the splitting, often a three-stage process (b) 
is assumed where the state is absorbed into its neighbor 
states, (c) 6-state model of kinesin [2]. The dashed line is 
the mechanical transition that allows the motor to move, (d) 
Coarse-grained description with states 3 and 6 reduced. 

action, the molecule undergoes a conformational change 
that couples the two active sites and induces a mechan- 
ical transition. This allows the motor to "walk" in a 
"hand-over-hand" mechanism [T3]. 

The catalytic cycle of a single head (Fig. flk) is an 
example of a general enzymatic activity (Fig.T2]). This 
mesoscopic, stochastic description with its fluctuations 
has its origins in a microscopic, deterministically chaotic 
dynamics. Here, we investigate how a stochastic descrip- 
tion can be further simplified while preserving its fluctu- 
ations. 

Stochastic Formalism We consider a Markov process 
on a finite number of mesoscopic states i £ [1..N]. 
We call them mesoscopic, because for physical systems 
they amount to a partition of the underlying microscopic 
phase space. Transitions between the states i and j oc- 
cur with time-independent rate constants Wj > 0. For 
simplicity, we assume that there is only one mechanism 
by which the transition between two states can happen 
(although a generalization is possible 03 E])- Because of 
the reversibility of microscopic physical laws we demand 
dynamical reversibility, i.e., Wj > -i^ > 0. One can 
visualize the system as a graph G = {V, £) with the meso- 
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FIG. 2. (color online) (a) Illustration of the coarse-graining 
procedure that leaves cycle topology constant: Reduction of 
a "bridge" b will absorb the two dashed edges of the original 
graph (top) into one edge in the coarse-grained graph (bot- 
tom). This also leads to a change of rates along the edges £i 
and £r whereas edges £o connecting only unchanged vertices 
Vo remain unchanged, (b) Enzyme catalysis. An enzyme E 
binds a substrate S to form a complex E ■ S. The substrate 
is split to form products P and p where the latter is always 
released first. The dynamics can be modeled with four (top) 
and three (bottom) states. 



scopic states as vertices V and edges £ where w^j > 0. At 
time t the system will be in state i with a probabihty 
Pi{t). The flux from state i to state j is 



(1) 



Assuming connectedness of the network, a unique invari- 
ant distribution pi exists [TB] • In the steady state the net 
influx to each state equals the net outflux (Kirchhoff's 



law for currents I] 



■ PiW. 



= OVi. (2) 



The steady-state probability distribution can be calcu- 
lated explicitly as a polynomial in the rates Wj using a 
graph-theoretic matrix-tree method [7]. From now on, 
all variables are time-independent steady-state quanti- 
ties unless mentioned otherwise. Eq. ([2| can be used to 
decompose the steady fluxes using different sets of cycles 
on G O [Zl [9] . A cycle a of length Sa is an ordered set of 
vertices which form a self-avoiding closed path, where we 
identify cycles differing only by a cyclical permutation of 
vertices. In the following, when referring to cycles, we 
mean non-trivial cycles with > 3. Central quantities 
for this work are the edge affinities A^^ — log (^(f>^j / (f>-fj . 

Along a cycle a — (ii, 12, . . . , is„) they add up to cycle 
affinities 



k=l 



^log 

k=l 



(3) 



In physical models they take only values that reffect the 
macroscopic thermodynamic affinities [71 [5] . 

Coarse graining We suggest a coarse-graining proce- 
dure based on natural requirements: 
(ia) Cycle topology: The number and mutual connections 



of cycles are preserved. This determines possible targets 
for the reduction. 

(ib) Cycle affinities: The algebraic values of the affinity 
of any cycle is preserved. This yields the connection to 
the macroscopic level, i.e., thermodynamics. 

(ii) Locality: Fluxes, probabilities and observables may 
only change locally. This yields the connection to the 
microscopic level, i.e., the microscopic phase space. 

(iii) Trajectories: The system's entropy variation along 
trajectories is preserved. This is a natural choice and 
closes the equations. 

To demonstrate our method we address cycles that 
contain bridge states, like states 3 and 6 in Fig. [T] Bridges 
are connected to exactly two neighbor states that are 
themselves not connected to each other as shown in 
Fig. [2^. We use the index h for the target bridge state, 
and I or r for the left or right neighbor. Without loss of 
generality we assume that there is a positive net current 
I — ll = > flowing through the bridge from the left 
to the right neighbor state. The other states, which must 
not be influenced by the procedure (c/. (ii)) are summa- 
rized in the set Vb C T^. In the CG procedure we absorb 
the bridge into its neighbors leading to new states and 
r' and adjust the transition rates for the sets of edges £1 
and £r connecting I and r to the rest of the network. This 
has to be done in accordance with requirement (i)b yield- 
ing Aa = A'^ for any cycle in the network. Demanding 
the conservation of fluxes along any edge e £ £ / {ei,er} 
not belonging to the bridge, Eq. ([s]) yields 

= (4a) 

Any trajectory passing through the two edges {I, b) and 
(b, r) in the original model will be a trajectory through 
{I' , r') in the coarse-grained model. The change of a tra- 
jectory's entropy (starting from a steady-state ensemble) 
is the difference of the logarithms of the invariant distri- 
bution [5]. With that, (iii) leads to 



Pl'/Pr' ^Pl/Pv 



(4b) 



A priori, other closures of the form pii /pr' — c are also 
possible but lack the advantage of the stochastic thermo- 
dynamic interpretation. 

Together with the steady-state balance condition ^ 
and the locality assumption, the Eqs. ([4| uniquely de- 
termine all rate constants of the coarse-grained model. 
They can be found to be 

(5a) 
(5b) 
(5c) 
(5d) 



^n' = ^li fo'' i '^Vo,n e {I, r} , 
= <// for ieVo,ne {I, r} , 

wt, = {I + m)/{fpi), 
= {m)/{fpr), 



where 



/ = {Pl +Pr+Pb)/{Pl +Pr), 



(6a) 
(6b) 
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FIG. 3. (color online) Large-deviation function (LDF) I{s) 
for the entropy production (blue, center), the product (red, 
left) and the substrate (green, shifted to the right by sq = 2) 
association for the enzyme model (Fig.[2j3). All transition 
rates are unity but the release rate for the first product p 
which has the value 100. The LDF obtained from the fluc- 
tuation preserving coarse-graining method ("FPCG") over- 
laps almost perfectly with the original model ("ori") while 
the "naive" choice strongly changes fluctuations. 

One may argue that the method might not be practical, 
because one has to solve the original model to compute 
the correct rates for the simpler model. However, the 
situation is better than this: One only needs the steady 
state of the original model, which is accesible numerically 
to arbitrary precision by different means. In particular, if 
the rates of the original model are subject to inaccurities 
either by measurement or by modeling, the numerical 
error (if there is any) to the steady-state probabilities is 
neglegible. 

Single cycle: Simple catalysis model The easiest re- 
ducible topology is a cycle consisting of four states, e.g. 
the enzyme catalysis presented in Figs.jT^ and[2j3 where 
a transient intermediate state is identified as the bridge. 
A naive approximation for the new rates would be wj./ — 
w\w^{ti,) and wj", — w'[^w\{ti,) where {rb)^^ — + w'^ is 
the time constant for decay out of the bridge state. Hill 
[7] derived this result for three linearly connected states 
with the center one being transient. This choice is also 
the basis of the method proposed in Ref. [H], where its 
shortcomings have already been discussed. It fulfills 



(7) 



In Ref. [TS] it is interpreted as a condition on the local 
energy landscape and was used to reduce the enzymatic 
reaction of kinesin's active site (Fig.jT^). If all other rates 
are unchanged, Eq. ([t] preserves the affinity ([3]) of the 
cycle, eowever, in general, it leads to a non-local redis- 
tribution of steady-state probabilities. Hence, it does not 
comply with our method. This yields another motivation 



of Eq. (4b). One can easily check that this condition on 



the ratio of the new probabilities is the only one that 
leads to rates that fulfill Eq. ([7|. 

Fluctuations of physical observables Since the CG 
procedure changes the mesoscopic state space, also 
coarse-grained physical observables need to be defined. 
We consider physical currents, which are modeled by 



-Oj to each transition i j. The observable O 



for the case where the bridge state has been eliminated 
has entries 

I' 



of 

^3 



Oi + 0\ + {di-dr), (8a) 

O'^ + dn for i e Vb,n e {;,r}, (8b) 

O) fori,je1/o. (8c) 

The constants di and d^ depend on the microscopic dy- 
namics and the chosen partitioning of phase space. As we 
do not know these details, di and dr act as gauges that 
do not change the macroscopic observations. Hence, in 
general, we choose di = d^ = Q for simplicity. 

A special observable, where the gauge is prescribed 
differently, is the quantity 



log {w]/w^i^ . 



(9) 



It is determined solely by the mesoscopic transition rates 
and therefore takes a special role. Hill calls it the basic 
free energy difference between two mesoscopic states [7]. 
Recently, Seifert made this point more clear by stating 
the assumptions, under which it can be identified with 
the heat dissipated in the medium for transition i — >■ j ^ 
[17] . In an electric analogy it would be the electromotance 



One finds 



B? 



- Bl, (10a) 
-log/ forieT/o,rie{Z,r}, (10b) 
fori,jeFo. (IGc) 

Eq. ( |10a[ ) is the logarithm of Eq. ([7|. Eq. ( |10b[ ) states, 
that along the edges f„, n g {/, r}, there is an additional 
contribution — log/ to Bf, which is the same for both 
neighbors due to the closure (4b). Eq. (10c) expresses 



locality and is independent of the closure. We note, that 
non-current observables defined on the states rather than 
the transitions can also consistently be transformed [H] . 

To investigate the steady-state fluctuations of the 
observables we consider stochastic trajectories w = 
(cjojWi, - -t^nS) featuring iV^ jumps in a prescribed time 
= r. The time-averaged mean of current observable 
O along trajectory w, 



(11) 



1=1 



is a bounded random variable with the distribution func- 
tion /^. For T — oo it converges weakly and fulfills a 
large-deviation principle, i.e., 



/0(s)=exp(-r/o(s)+o(r)), 



(12) 



where o(t) stands for a term sublinear in t. Further, by 
the Gartner-Ellis theorem [50], the large-deviation func- 
tion Io{s) is the unique Legendre transform of the scaled 
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FIG. 4. (color online) Simulation and numerical results for dissipation rate, moving velocity and hydrolysis rate of the kinesin 
model. Data is shown for the original 6-state model ("ori"), a 5-state model with state 6 reduced ("6") and two 4-state models 
with state 6,3 or 6,4 reduced ("63" and "64"). The rate constants for the original model are taken from Ref. [2] describing 
the data in Ref. IT for chemical concentrations cadp = cp = catp = l^M and stepping size I ~ 8nm. The top row shows 
the sampled pdf for r ~ 1200s (opaque symbols) and r « 120s (transparent symbols). The bins with the width of half an 
empirical standard deviations are centered around the empirical mean. For the simulation we sampled = 5000 trajectories. 
The bottom row shows convergence of rescaled data (c/. Eq. (12|) to the rate function 7(s) (solid lines). 



cumulant generating function (SCGF) 



C(A)= lim -logE[exp(ATjO)] 



(13) 



where E[-] denotes the expectation value on the space 
of trajectories running for time r. The SCGF can be 
calculated fSD] as the dominant eigenvalue of the tilted 
transition matrix Wo{^) with entries 



{Wo)] 



w. 



exp (AO}) 



(14) 



To obtain numerical data for the rate function Io{s) of 
an observable O one follows the numerical scheme: Cal- 
culate Wo (A), determine its largest eigenvalue C(A) and 
find its Legendre transform with respect to A. For the 
last step, the algorithm described in Ref. [H] was used. 

Fig. [3] shows such numerical results for different phys- 
ical currents of the enzyme model (Fig. [2]d). Unlike the 
naive choice for the rates (dashed lines), our CG method 
(dotted lines) preserves steady-state averages and fluctu- 
ations of the original model (solid lines) to a very high 
degree. Bounds of the deviation can be obtained from 
inequalities for Perron- Frobenius eigenvalues P^ . 

Multiple cycles: Kinesin's network of states Our CG 
mechanism also captures fluctuations of observables for 
finite times and in models with multiple cycles. Fig- 
ure [ij^c) shows kinesin's network of states [5]. Using 
our method ([5| we reduced the bridge states appearing 
in the diagram. The result of a successive reduction of 
states 6 and 3 is shown in [ijd) . Additionally, we ana- 
lyzed models with only state 6 reduced and both states 6 
and 4 reduced. Figure |4] shows the results of simulations, 
and the convergence to the large-deviation rate function 
I{s) for the total dissipation rate (entropy production in 



the medium), the steady-state velocity and the hydroly- 
sis rate of the kinesin model. Already for finite times the 
agreement between the original and the reduced models 
is extremely good. The rate function I{s) for the dif- 
ferent models agree extremely well to the level of being 
indistinguishable in vicinity of the average value. Only in 
the far tails one can observe that the result is not exact. 

Discussion In this Letter we presented a new method 
to simplify stochastic dynamics on finite state spaces. 
A coarse-graining method that preserves the connection 
with both the underlying microscopic dynamics and the 
macroscopic thermodynamics was constructed. Here we 
considered bridge states, but the same ideas apply to 
tree-like subgraphs. Two biochemical examples where 
considered: A generic single-cycle model for enzymatic 
catalysis and a well-established multi-cycle model for the 
molecular motor kinesin. The reduction using the new 
paradigm preserves fiuctuations of current observables in 
great detail. Future work will focus on coarse-graining 
that includes changes of the cycle topology. 

The authors are indebted to L. Rondoni, A. Puglisi, 
S. Herminghaus and H. Touchette for many illuminating 
discussions. 
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